missingness probability
- North America > United States > Utah (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > Canada (0.04)
ARobustFunctionalEMAlgorithmforIncomplete PanelCountData
Panel count data describes aggregated counts of recurrent events observed at discrete time points. To understand dynamics of health behaviors and predict future negative events, the field of quantitative behavioral research has evolved toincreasingly rely upon panel count data collected viamultiple self reports, for example, about frequencies ofsmoking using in-the-moment surveysonmobile devices. However, missing reports are common and present a major barrier to downstream statistical learning.
- North America > United States > Utah (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption
Matrix completion is often applied to data with entries missing not at random (MNAR). For example, consider a recommendation system where users tend to only reveal ratings for items they like. In this case, a matrix completion method that relies on entries being revealed at uniformly sampled row and column indices can yield overly optimistic predictions of unseen user ratings. Recently, various papers have shown that we can reduce this bias in MNAR matrix completion if we know the probabilities of different matrix entries being missing. These probabilities are typically modeled using logistic regression or naive Bayes, which make strong assumptions and lack guarantees on the accuracy of the estimated probabilities.
- North America > United States > Utah (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption
Matrix completion is often applied to data with entries missing not at random (MNAR). For example, consider a recommendation system where users tend to only reveal ratings for items they like. In this case, a matrix completion method that relies on entries being revealed at uniformly sampled row and column indices can yield overly optimistic predictions of unseen user ratings. Recently, various papers have shown that we can reduce this bias in MNAR matrix completion if we know the probabilities of different matrix entries being missing. These probabilities are typically modeled using logistic regression or naive Bayes, which make strong assumptions and lack guarantees on the accuracy of the estimated probabilities.
Optimal Transport with Heterogeneously Missing Data
Bleistein, Linus, Bellet, Aurélien, Josse, Julie
We consider the problem of solving the optimal transport problem between two empirical distributions with missing values. Our main assumption is that the data is missing completely at random (MCAR), but we allow for heterogeneous missingness probabilities across features and across the two distributions. As a first contribution, we show that the Wasserstein distance between empirical Gaussian distributions and linear Monge maps between arbitrary distributions can be debiased without significantly affecting the sample complexity. Secondly, we show that entropic regularized optimal transport can be estimated efficiently and consistently using iterative singular value thresholding (ISVT). We propose a validation set-free hyperparameter selection strategy for ISVT that leverages our estimator of the Bures-Wasserstein distance, which could be of independent interest in general matrix completion problems. Finally, we validate our findings on a wide range of numerical applications.
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
Reviews: Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption
Since the algorithm of estimating the propensity is proposed by Davenport et al. 2014, the originality of the paper mainly lies in the bounds derivation and experiments. For the bounds of the bias and overall completion error, there is no direct experiments bridging the proposed theory and practice. I would like more empirical evidences on the assumptions from real-world matrices, beyond the recommendation domain where COAT and MovieLens are from. The novelty of the paper is also less impressive when the motivation of investigating the adoption of nuclear norm is unclear. From the experiments, it is only demonstrated that the proposed propensity estimator can achieve similar results as previous classic methods (and can be even slightly worse if data fits better for Naive Bayes or Logistic Regression). The performance gain of the newly proposed estimator on the MovieLens dataset (the largest experimented datasets) is not very significant compared with Naive Bayes, meaning that when m and n are large the bias and completion error is similar to Naive Bayes.
Reviews: Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption
This paper addresses the problem of handling missing not-at-random measurements in matrix completion. This is not a new line of thought in statistics literature, but this paper nicely bridges the ideas to present them to a NeurIPS audience. That said, it seems like the authors are unaware of some key recent work, including those I include below. In their revision, the reviewers must look into and include citations from this literature.
Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption
Matrix completion is often applied to data with entries missing not at random (MNAR). For example, consider a recommendation system where users tend to only reveal ratings for items they like. In this case, a matrix completion method that relies on entries being revealed at uniformly sampled row and column indices can yield overly optimistic predictions of unseen user ratings. Recently, various papers have shown that we can reduce this bias in MNAR matrix completion if we know the probabilities of different matrix entries being missing. These probabilities are typically modeled using logistic regression or naive Bayes, which make strong assumptions and lack guarantees on the accuracy of the estimated probabilities.